Sensor visibility is crucial for safety-critical applications in automotive, robotics, smart infrastructure and others: In addition to object detection and occupancy mapping, visibility describes where a sensor can potentially measure or is blind. This knowledge can enhance functional safety and perception algorithms or optimize sensor topologies. Despite its significance, to the best of our knowledge, neither a common definition of visibility nor performance metrics exist yet. We close this gap and provide a definition of visibility, derived from a use case review. We introduce metrics and a framework to assess the performance of visibility estimators. Our metrics are verified with labeled real-world and simulation data from infrastructure radars and cameras: The framework easily identifies false visible or false invisible estimations which are safety-critical. Applying our metrics, we enhance the radar and camera visibility estimators by modeling the 3D elevation of sensor and objects. This refinement outperforms the conventional planar 2D approach in trustfulness and thus safety.
translated by 谷歌翻译
离线增强学习(离线RL)是一个新兴领域,由于其能够从早期收集的数据集中学习行为,该领域最近开始在各个应用领域中引起关注。当与环境进一步交互(计算或其他方式),不安全或完全不可行时,必须使用记录数据。离线RL被证明非常成功,为解决以前棘手的现实世界问题铺平了道路,我们旨在将此范式推广到多代理或多人游戏设置。由于缺乏标准化数据集和有意义的基准,因此在这一领域进行的研究很少,因为进展受到阻碍。在这项工作中,我们将术语“离线平衡发现(OEF)”创造,以描述该区域并构建多个数据集,这些数据集由使用多种既定方法在各种游戏中收集的策略组成。我们还提出了一种基准方法 - 行为克隆和基于模型的算法的合并。我们的两种基于模型的算法 - OEF-PSRO和OEF-CFR - 是在离线学习的背景下,广泛使用的平衡发现算法深入CFR和PSRO的适应。在经验部分中,我们评估了构造数据集上基准算法的性能。我们希望我们的努力可以帮助加速大规模平衡发现的研究。数据集和代码可在https://github.com/securitygames/oef上获得。
translated by 谷歌翻译
配备了广泛的传感器,主要的自主驾驶解决方案正变得越来越面向安全系统设计。尽管这些传感器已经奠定了坚实的基础,但最新的大多数生产解决方案仍然属于L2阶段。其中,Comma.ai出现在我们的视线中,声称一个售价999美元的售后设备装有单个相机和内部的木板具有处理L2场景的能力。该项目与Comma.ai发布的整个系统的开源软件一起名为OpenPilot。可能吗?如果是这样,它如何成为可能?考虑到好奇心,我们深入研究了OpenPilot,并得出结论,其成功的关键是端到端系统设计,而不是传统的模块化框架。该模型被简要介绍为SuperCombo,它可以从单眼输入中预测自我车辆的未来轨迹和其他道路语义。不幸的是,无法公开提供所有这些工作的培训过程和大量数据。为了进行深入的调查,我们尝试重新实现培训细节并测试公共基准测试的管道。这项工作中提出的重构网络称为“ op-Deepdive”。为了将我们的版本与原始SuperCombo进行公平的比较,我们引入了双模型部署方案,以测试现实世界中的驾驶性能。 Nuscenes,Comma2K19,Carla和内部现实场景的实验结果证明了低成本设备确实可以实现大多数L2功能,并且与原始的SuperCombo模型相当。在本报告中,我们想分享我们的最新发现,并阐明了从工业产品级别方面进行端到端自动驾驶的新观点,并有可能激发社区继续提高绩效。我们的代码,基准在https://github.com/openperceptionx/openpilot-deepdive上。
translated by 谷歌翻译
Error correction is widely used in automatic speech recognition (ASR) to post-process the generated sentence, and can further reduce the word error rate (WER). Although multiple candidates are generated by an ASR system through beam search, current error correction approaches can only correct one sentence at a time, failing to leverage the voting effect from multiple candidates to better detect and correct error tokens. In this work, we propose FastCorrect 2, an error correction model that takes multiple ASR candidates as input for better correction accuracy. FastCorrect 2 adopts non-autoregressive generation for fast inference, which consists of an encoder that processes multiple source sentences and a decoder that generates the target sentence in parallel from the adjusted source sentence, where the adjustment is based on the predicted duration of each source token. However, there are some issues when handling multiple source sentences. First, it is non-trivial to leverage the voting effect from multiple source sentences since they usually vary in length. Thus, we propose a novel alignment algorithm to maximize the degree of token alignment among multiple sentences in terms of token and pronunciation similarity. Second, the decoder can only take one adjusted source sentence as input, while there are multiple source sentences. Thus, we develop a candidate predictor to detect the most suitable candidate for the decoder. Experiments on our inhouse dataset and AISHELL-1 show that FastCorrect 2 can further reduce the WER over the previous correction model with single candidate by 3.2% and 2.6%, demonstrating the effectiveness of leveraging multiple candidates in ASR error correction. FastCorrect 2 achieves better performance than the cascaded re-scoring and correction pipeline and can serve as a unified post-processing module for ASR.
translated by 谷歌翻译
We introduce Argoverse 2 (AV2) - a collection of three datasets for perception and forecasting research in the self-driving domain. The annotated Sensor Dataset contains 1,000 sequences of multimodal data, encompassing high-resolution imagery from seven ring cameras, and two stereo cameras in addition to lidar point clouds, and 6-DOF map-aligned pose. Sequences contain 3D cuboid annotations for 26 object categories, all of which are sufficiently-sampled to support training and evaluation of 3D perception models. The Lidar Dataset contains 20,000 sequences of unlabeled lidar point clouds and map-aligned pose. This dataset is the largest ever collection of lidar sensor data and supports self-supervised learning and the emerging task of point cloud forecasting. Finally, the Motion Forecasting Dataset contains 250,000 scenarios mined for interesting and challenging interactions between the autonomous vehicle and other actors in each local scene. Models are tasked with the prediction of future motion for "scored actors" in each scenario and are provided with track histories that capture object location, heading, velocity, and category. In all three datasets, each scenario contains its own HD Map with 3D lane and crosswalk geometry - sourced from data captured in six distinct cities. We believe these datasets will support new and existing machine learning research problems in ways that existing datasets do not. All datasets are released under the CC BY-NC-SA 4.0 license.
translated by 谷歌翻译
Autonomous vehicles are being deployed with a spectrum of capability, extending from driver assistance features for the highway in personal vehicles (SAE Level 2+) to fully autonomous fleet ride sharing services operating in complex city environments (SAE Level 4+). This spectrum of autonomy often operates in different physical environments with different degrees of assumed driver in-the-loop oversight and hence have very different system and subsystem requirements. At the heart of SAE Level 2 to 5 systems is localization and mapping, which ranges from road determination for feature geofencing or high-level routing, through lane determination for advanced driver assistance, to where-in-lane positioning for full vehicle control. We assess localization and mapping requirements for different levels of autonomy and supported features. This work provides a framework for system decomposition, including the level of redundancy needed to achieve the target level of safety. We examine several representative autonomous and assistance features and make recommendations on positioning requirements as well map georeferencing and information integrity.
translated by 谷歌翻译
This work builds on the models and concepts presented in part 1 to learn approximate dictionary representations of Koopman operators from data. Part I of this paper presented a methodology for arguing the subspace invariance of a Koopman dictionary. This methodology was demonstrated on the state-inclusive logistic lifting (SILL) basis. This is an affine basis augmented with conjunctive logistic functions. The SILL dictionary's nonlinear functions are homogeneous, a norm in data-driven dictionary learning of Koopman operators. In this paper, we discover that structured mixing of heterogeneous dictionary functions drawn from different classes of nonlinear functions achieve the same accuracy and dimensional scaling as the deep-learning-based deepDMD algorithm. We specifically show this by building a heterogeneous dictionary comprised of SILL functions and conjunctive radial basis functions (RBFs). This mixed dictionary achieves the same accuracy and dimensional scaling as deepDMD with an order of magnitude reduction in parameters, while maintaining geometric interpretability. These results strengthen the viability of dictionary-based Koopman models to solving high-dimensional nonlinear learning problems.
translated by 谷歌翻译
诸如DALL-E 2之类的生成模型可以代表放射学中人工智能研究的图像生成,增强和操纵的有希望的未来工具,前提是这些模型具有足够的医疗领域知识。在这里,我们证明DALL-E 2在零拍的文本到图像生成方面,学习了具有有希望的功能的X射线图像的相关表示,将图像的延续超出其原始边界或删除元素,尽管病理产生或CT,MRI和超声图像仍然受到限制。因此,即使事先需要对这些模型进行进一步的微调和适应,也需要使用生成模型来增强和生成放射学数据似乎是可行的。
translated by 谷歌翻译
链接的语音实体旨在识别和消除语言中的命名实体。常规方法严重遭受了不受限制的语音样式和ASR系统产生的嘈杂笔录。在本文中,我们提出了一种名为“知识增强命名实体识别”(KENER)的新颖方法,该方法致力于通过在实体识别阶段无痛地纳入适当的知识来改善鲁棒性,从而改善实体联系的整体性能。肯纳(Kener)首先检索未提及的句子的候选实体,然后利用实体描述作为额外的信息来帮助识别提及。当输入短或嘈杂时,由密集检索模块检索的候选实体特别有用。此外,我们研究了各种数据采样策略和设计有效的损失功能,以提高识别和歧义阶段中检索实体的质量。最后,将与过滤模块的链接作为最终保障措施应用,从而可以过滤出错误认可的提及。我们的系统在NLPCC-2022共享任务2的轨道1中获得第一名,并在轨道1中获得第一名。
translated by 谷歌翻译
智慧城市利益的最新全球增长导致了数万亿美元用于研发的投资。这些连接的城市有可能建立技术和社会的共生,并在全球范围内彻底改变社会的生活,安全,生态可持续性和生活质量。智能城市结构的一些关键组成部分是连接的智能电网,自动驾驶汽车,联合学习系统,智能公用事业,大规模的公共交通和积极的监视系统。尽管前景令人兴奋,但如果不解决这种高度自动化和数据共享的潜在社会影响,这些技术及其后续集成就无法尝试。此外,协调如此多的不同任务的可行性将需要一个快速,可扩展,统一的框架。为此,我们提出了Faro2,这是一个完全重新构想的Faro1的继任者,它是从头开始建造的。 FARO2提供了与其前身相同的功能,它充当统一的生物识别API线束,可为异构生物识别软件提供无缝评估,部署和简单的管道创建。 FARO2还提供了完全声明的功能来定义和协调自定义机器学习和传感器管道,从而使过程在原本不兼容的硬件和网络中分布。 Faro2最终提供了一种方法,可以在线快速配置,热门塑料和扩展大型协调或联合系统,而不会中断维护。由于在智能城市中收集的许多数据都包含个人识别信息(PII),因此FARO2还提供内置工具和层,以确保跨分布式系统跨系统的安全和加密的流媒体,存储和访问PII数据。
translated by 谷歌翻译